NOTES.END[LSP,JRA]9 - www.SailDart.org

perm filename NOTES.END[LSP,JRA]9 blob sn#134865 filedate 1974-12-02 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00028 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00004 00002	.SEC(The dynamic structure of LISP,dynamic structure,,P107:)
C00009 00003	.SS(%aSM%*:A Simple machine,%aSM%*,P33:)
C00016 00004	.SS(On Compilation,compiler)
C00027 00005	.SS(Compilers for subsets of LISP)
C00045 00006	.SS(Problems)
C00046 00007	.SS(One-pass Assemblers,assemblers,P7:)
C00059 00008	.SS(A compiler for simple %3eval%*,compiler,P32:)
C00063 00009	.<<discussion of offset and vpl>>
C00074 00010	Here is a partial sketch of %3compile%* operating on %3j <= λ[[xy]f[g[x]h[y]]].%*
C00078 00011	.SS(A project: Efficient compilation)
C00082 00012	.SS(A project: One-pass assembler)
C00085 00013	.SS(A project: Syntax directed compilation,syntax-directed,P4:)
C00086 00014	.SS(A deep-binding compiler,deep binding)
C00087 00015	.SS(Compilation and global variables,global variables,P5:)
C00095 00016	.SS(Functional Arguments,,P3:)
C00096 00017	.SS(Macros and special forms,macros,P57:)
C00101 00018	.SS(Debugging in general)
C00106 00019	.SS(Debugging in LISP)
C00112 00020	.SEC(Storage structures and efficiency)
C00117 00021	.SS(Bit-tables)
C00119 00022	.SS(Vectors and arrays)
C00126 00023	A typical implementation on an array facility in LISP would include
C00128 00024	.SS(strings and linear LISP,string processor,P27:)
C00138 00025	.SS(%3rplaca%* and %3rplacd%*)
C00143 00026	.SS(Applications of %3rplaca%* and %3rplacd%*)
C00157 00027	.SS(Numbers,numbers)
C00161 00028	.SS(Stacks)
C00168 ENDMK
C⊗;
.SEC(The dynamic structure of LISP,dynamic structure,,P107:)
.TURN ON "←";

.SS(Introduction)

As things presently stand we have the  basic static structure of a LISP machine consisting
of %3eval%* and friends, the I/O routines, the garbage
collector, an initial symbol table which believes in the primitive 
functions and  some commonly needed utility functions. 
We have two areas of memory: free space, and full word space.

Expressions (forms)  to be evaluated are read in, converted to list structure,
and evaluated by %3eval%*.  What %3eval%* is doing is traversing the 
sexpression representation of the form, interpreting the 
information found there as LISP "instructions".  This is an expensive process.
In the general case there is nothing we can do to reduce this cost, but very
frequently there is a mapping from the LISP expressions to be 
evaluated to a sequence of actual machine instructions, which when
carried out will have the same computational effect as %3eval%*, massaging
the list structure.  This mapping is called a ⊗→compiler↔←. 
So a compiler is simply a useful tool for increasing the  speed of execution.
It adds nothing conceptually. Indeed, it is hopeful sign to see Computer
Scientists moving away from their preoccupation with compilers. The  current
difficulties in the field are not to be solved by faster or smarter compilers

We have said nothing about how the actual execution of the expression
takes place. The essential ingredient involved here  is the handling
of control -- the dynamics of LISP execution.
How can we implement call-by-value, parameter passing, recursion, and 
conditional expressions?
This chapter will deal with the implementation of the control structures
of LISP. 
We will describe implementations of these features as we develop
compilers for LISP.
This use of compilers has many motivations:

%21.%*  To describe the compiler we must carefully define the machine
which will execute the instructions which the compiler produces.
The code generated by the compiler will reference the control primitives of the 
machine, so we might as well build a compiler at the same time we
show the dynamic structure of LISP.

%22.%*  The design of the compiler  shows a very non-trivhp∞aβπCεc'∂π&K?84V{→β;}q7;Wn+K'∂∞`4+∂}kCWS∂#'?9rαπQβ&C∃βO∞k∃βSNk∃β←*β←'3bβO↔∃εC?]β≡K7C3*β'QβO→βS<hS∪↔O∨∪'∃π≠W∂!ε≠?7Cf+aβπf;?K'&C7MβNqα2&≥↓84(hQ∃IMr))↓αO!β←'faβ∂3.K3eπ≠#?]π##∃β⊗+3πSN{;O#O↓β↔';↔↔9ε≠?7CNcπS'}qβπ; h+↔[∞cWπSN{99↓¬##πQεKM1β&C∃α2M~Aβ≠.s∂S'}qβK↔π∪↔O↔w#';≥π##∃β≡{7C'f+H4+>K31β6+Keβ≡c?O↔gIβCπ⊗33↔bβS#∃π≠SKW∨#WK∃ε{→βSF)β';&+KCK/#↔I1α)O↔[∞a∃)8hR'→βN{UβWv#↔KO&;⊃↓+≠↔[πb))1β&C↔9β&C∃β∂}kC'3/⊃β'Mε+πOeph(4)+⊃Q9∃R↓α↔[/∪g?;*βO#?.c⊃βO.)β¬β≡{7C'f+I↓!∩)JO#/!βWA
))	1εC∃↓∃≡+cC3∞K;↔⊃*Q%84Ph*πMεK9βSF)βCK/3'?W~β∂#ππ#↔I1π;∃β←Nc1βπ'#↔7C"βS=β⊗+7π'rβπMβ∞∪OSK∞≠Qβπv!βπ3}{_4+∂→βC?∨≠'3*β←'SF{WQβf{O';:βS#∃εs↔∂↔∨≠πKeε#↔SπNcM9hRWQε	β7↔∞s';∨7+04+&+O∂KOβS'?rβ?→β≡{7C'f+KMβ.sSπ'g→βπ9π+;∪↔↔≠Sπ;&K;≥β}1β¬βn∂#'v)84*≡yβ↔6{K∃β&C∃βπ∨#Wπ1ε≠?;O'∪W∂SN{9β?2βS#∃ε≠?7CNc↔KMbβ←∃β>K304V#↔O∂⊗K∃β
βO'7εc∃β7∞≠#';*a↓↔ε≤i∃)0hS←'SBβ¬βO.3≠'∂N+;QβNsOSK.≠S'?rβO↔Qπ#=β#∞s∪3∃π##∃β≡{;S?bβOSK.≠SWK/→β?→∧b&NAph(1:N~A↔εNj))j¬¬≠'7Cf)β7π≡C';∃b+εN5*Q2AM≠Q$4*&C'Mβ≡+∂S'}q↓β∪/≠∂K'⊗+Mβ¬π≠'7Cf)β7π≡C';∃π;#'∂@h+#π~β¬βO.3≠'∂N+;QβNsOSK.≠S'?rβO↔Qπ#=β#∞s∪3∃ε9β'oβ3↔7.sSπSN{9β?2α2&Nαp4*.3?K∃π;∃β.;'91εs?S∃π##πQπ##'Mεkπ∂#Ns∃β'~↓∃K;␈!∃)βv+∂↔O≡Keβ6{Iβ?/⊂4+Wv#↔KO&;∪'v9β?→α)O↔[∞a∃)9α)O↔[∞a∃)βO→βO↔f17∂?w#π';.!9α←*β;↔↔"β?;3Jβ∪↔O∨∪'∀hS¬β7∞≠#';*βS=β&KO∂W∨→β'7εc↔7↔w#πS'}qβπ;"β∂?7εK3πSN{99↓¬##'M`h+';&+↔⊃βO→βπ9ε{+↔∨#'?9π#=β∪/≠∂K'⊗K;≥βn+π;'v9β?→πβK?∨⊗77'v9β3πv;Wπ∨/_4+'rβS↔Ko→β?→ε	β∂?oβ'3↔∩↓55βN{Uβ7/≠QβWv#↔KO&;⊃↓+∪S←=*QβS#Ns∨Mβv{]↓5jβS#∀hS3π;?+π∨∃α)Kπ;"))β¬εkπ∂#Ns∃84Ph*S#*βO'7εc∃β7∞≠#';*a↓↔ε≤i∃)1εCπMβ
βO3'>CQβONk'3π⊗KSeβ&yβS#*β?K∨∞s'kπ&K?9β}1βS#(h*B∩αiEA9¬;∃β;.+⊃β[/∪eβ≠/9β≠↔∂#WK↔~βS=β∞#↔GW∂#↔3eε#'O∂/≠MβSF)β';&+K↔O&K;≥hS≠π∂/#Mβ?2β'7Cf+7↔;&S'?ph+?→ε{WIαdJNAβ∨+O↔"qα∂↔↔#π';gI1β'2β←∃β>+K∃β&yβW;&+KSπ↑)β¬β⊗+π1βNkC3↔n+;Sπ&K?91εkπ;dhS7?K*β';O'∪W∂SN{;Mβ>{W3⊃ε∪∃β;.≠↔OO∂∪e9α≡K7'3∂∪3e1π;#↔9π;∃β∪O≠∂WO~β∂?7εK3πSN{84+␈+I↓↔
~5∃)π≠W≠≠N≠↔M1ε∪WQβN1β←∃π;'O#.!βS=πβ↔K≠␈∪5↓∃⊗+≠≠'≡K↔;Q*Qβ∂?oβ'3π&K?84W;∃β←␈+3⊃βF{C↔≠.c3eβF[∃β
β↔S&+Iβ'w≠SKW∨#'?9π≠↔Q9ααS#∃πβ?';"β#↔K(h+'Mπ#=βWv#↔KO&;⊃β⊗O'
ε3∨?⊗KS#7~qα'→π##πQεKMβπ≡≠?7CfKO#↔"β'QβO→βGWO#∀4+∨#Kπ'>CS≠?↔;πK⊃π#=β↔F7';*βCK?⊗c↔7Mε{→β↔63'∂'.s∂e1ε;⊃β&+Sπ'g→β?→εK7C3.k↔;S∂#'?9ph(4).
N5∃Rβ#πMε	β∂?w3↔;SN{;π1ε∪∪K/≠Oπf)β7πNqβ7↔n{Ke1ε;⊃βrβOC↔≡Kπ1β⊗+∨'O&+KM0hRε
Ebαε
Ib↓999bαε∂9rαS#↔≡)βK↔>KOS↔↔→βπK*β∂π3f+⊃↓∃⊗∂∂Wo+3πS␈∪M∃(hSπ;⊃εMβ←O#!βSF)αε6∀JQ>≥ε#↔O∂⊗KCS'}q1βSF+O∃βrβK↔∨O≠S↔K~βπK∃π#=β∂}sSπ'rβC?'w#↔KLhSS=β&C∃βπ⊗;W7↔w#Mβ?2β¬β≠.s∂S'}qβπQπ##∃β&K7∃β}1β';6{∂πSN{984T+π∂!εk↔7?↔Iβ3?≡S'?rβ?Iβ⊗+∨'O&+Iβ'~βπOO.k↔⊃β&yβ∃εcπK∨*β↔;?.;!βSzβ∂?;&'84W#←=β∞#∪K↔∨≠↔M9∧3?Iβ≡/∃β}1β∪'≡≠WOON{91β∂≠OW7*βS#∃π;?K⊃π≠'k∃εKM↓M2β'S~p4*π∨≠W7∃π##∃β∞#∪K↔∨≠';≥π≠Cπ∂*β'Mβ&C↔9↓∩)aEa*Q94U##∃βnCC'v9β?→ε	↓∃↑YPhjtQPju∃Rβ?;Szβ¬↓↔
~5∃)εc?∂π&K?9βO→β↔π∨ImβSF)↓∃O≡I∃(hS7πC~βS=β&C∃β3.3Q7#∞c→β?2βS#∃π;?K⊃ZβS#∃α)O∂∪∩))1β&yβS#*βK'∨G!84*
β7↔7␈∪eβπ⊗+¬βSzβ∂?;&'9β7+317>{K∪MαA∃]αYPhhhQRu∃)αIβ'Mε#↔O'>sπS↔"q4*∞c1βAnsπ7↔~βπ;⊃εsW7/∪Mβπ⊗)βOS␈∪↔⊃βF+K∃8hP4*C∂∪SMβ}1↓↔ε≤i∃)βn+7?KJβ∂π9ε∪∃β∪/≠'∨;∂#↔⊃β∂→βOS∞≠/M9∧+π∂!π≠Sπ∂Zβ'Mβ
β∂?;&K∨W?/_4+π⊗+¬β?2β7↔7␈∪e1β∞s⊃βSF)β∂W↔∪↔;Qπ#?Aβ}1βS#*βOSπ≡Yβ'Mπ∪↔≠↔⊗+;∂↔"↓βeε	β∨↔v+Kπ0hSK↔∨O≠S↔IbαAE1αq991¬β91β≡33↔"β¬↓∃↔≠Sπ∂Zi∃)β␈⊃↓∃Kπ+O#∪␈;95∃RβC?'w#↔I8hRS#∃π≠Sπ∂←→β←'faβ∃π+O↔⊃π#=β∂}sSπ'rβS#∃πβπKSN1βK/≠W3S~β?→β≡3∂WfS'?w→βπ; h+←'faβ∂?w#π'9π##∃βNs≠?KnS'?rβ;↔∂/≠OπKJβS=βNkC3↔n+;Qβ⊗+∂WK≡K[∃β7+;∂SN{9β∂∞c3';:p4*SF)βCKNkπKe∧b&NAπ≠Sπ∂Zβ'Mβv7↔⊃¬↓84(hRS#↔⊗)βπK*β?;3JβS#K.)β7πNqβ∂3∂≠O↔Mε{→β'w≠SKW∨#'?;~β;↔∂/≠OπKJβS=β&+O∂KN∪∀4*dJNAβNkC3↔n+;Sπ&K?9iεK;OS↔+∂S'}sMβ≠␈⊃β∂?w≠Sπ;"β∨↔;/∪πS'}q1β'w≠SKW∨#'?;~β≠?HhSOSπ≡Yβ7πvKCW3∂#'?9bβπ;⊃εK;OS↔+∂S'}sMβ≠␈⊃β∂?w#K?1ε{→β≠f{]84Ph*S#*β∂?;'∪?1βNsOSK.≠S'?w→βπ;"βO?7*β?→β&C∃βO&∂-βNsOSK.≠S'?w→βK↔6+IβSzβS#∃h+CK};Kπ5ε≠?W;&+Iβ?2↓↔εNj))9↓.
B
∃Rβ∪↔ON;;πS/→βS#O→β∂?.sS↔Ir↓∃N
*Qβ'9π##∃β6{33?>K;≤4Vk↔π;~↓∂?w#↔;S~β?→9rq	84Ph):
,:&9α$

&Q
AEe%]"VJ9∧z~→↓∃y	l4Ph*6>4*%αε≡Iβ∂?w≠Rq∃≤→∃E"≠%%αzβ∂?;∨ 4(4UαVN!¬↓αε∂Ua∃N
*Q!∃N~))"AJIα⎇↓+~
∃)D
∂)$hRq∃N~))"AJα⎇↓∃≤→∃)"αI-E9¬βWO!ε≠?;S.sSMβ}1αε∂onto top of stack.

POP P ACj\%3C%*(P) ← %3C%*(P)-1
\%3C%*(ACj) ← %3C%1(%3C%*(P)). Pop top of stack into ACj.

POPJ P\P ← %3C%*(%3C%*(P))-1
\%3C%*(%aPC%*) ← %3C%*(%3C%*(P)). Pop top of stack into %aPC%*; used as return.

.BEGIN INDENT 0,19;FILL;
CALL i fn\This instruction handles function calling. i is the number of arguments
(assumed to be in AC1 through ACi at time of call). fn represents a function name.
One effect of CALL is to leave return information on the stack, P.
.END

MOVE ACi -j P\%3C%*(ACi) ← copy of the j%8th%* element (counting from zero%8th%*) of stack P.

MOVEM ACi -j P\j%8th%* element of stack P is replaced by contents of ACi.

.BEGIN INDENT 0,19;FILL;
SUB x y\%3C%*(x) ← %3C%*(x) - %3C%*(y); Used in the context, SUB P const, to remove
a block of cells from the top of the stack.
.END

JRST j\%3C%*(%aPC%*) ← j. Go to loc j.

JUMPE ACi j\%2if%* %3C%*(ACi)=0 %2then%* %3C%*(%aPC%*) ← j;

JUMPN ACi j\%2if%* %3C%*(ACi)%c≠%*0 %2then%* %3C%*(%aPC%*) ← j;

.END
For much of the sequel, the only calling sequence of interest will be
ordinary function calls.  The evaluated arguments are to be loaded into
the accumulators.
Internal λ-expressions are not handled yet. (See problem IV on {yon(P36)}).
More instructions for %aSM%* will be given on {yon(P35)} when we
discuss efficiency.

.SS(On Compilation,compiler)

The relationship between compilation and interpretation should be coming
clear:  the interpreter performs the evaluation; the
compiler emits instructions which when executed produce the same computational
effect as the evaluator.  
Since the code produced by the compiler is either in  machine language
or in a form closer to the machine than the source program, we can execute
the code much faster. A speed-up factor of ten is not uncommon.
Also in LISP we know that programs are stored as sexpressions.
Our compiled code will be machine language, thus freeing a large quantity
of sexpression storage. This will usually make garbage collection 
be less time consuming.
Why not compile all programs? We already have seen that we can %3cons%*-up
new expressions to be evaluated while we are running. Still we can force
even those expressions through a compiler before execution. The answer,
rather, is that for debugging and editing of programs it is extremely convenient
to have a structured representation of the program 
in memory. 

This structured representation  also simplifies the discussion of compilation.
Conventional compiler discussions always include description of the syntax analysis
problems. For we  cannot  compile code until we know what the syntactic structure
of the program is. The problem is that  syntax analysis is really irrelevant
for a clear understanding of the function of a compiler.
Assuming the existence of the structured representation the compiler
is conceptually very simple.

A few soothing words to forestall mutiny:  Though it is
true that syntax-directed compilation (see {yonss(P4)} can be used to go directly
from external program to internal machine code, our contention is that
sophisticated programming systems %2need%* the structured representation
(parse tree). 
Other processors in a modern system -- the editors, and the  debuggers,
to mention two-- can profitable fondle  the parse tree. When
we wish to run the program at top speed, we can call the compiler.
The compiler can then translate the abstract representation of the program
into machine code.
We shall say more about this few of programming later.


We shall exploit the analogy between compilers and evaluators when we write the
LISP function, ⊗→%3compile%*↔←, which will describe the compiler.
We shalll do this in two ways. First, the structure of the %3compile%* function
and its subfunctions will be written to capitalize on the knowledge we
have acquired from writing evaluators. Second and more difficult, we will attempt
to abstract from the specific compiler, the essence of LISP compilers without
losing all of the details. At the most general level a compiler simply
produces code which when executed has the same effect as evaluating the original
form, but this description has lost too much detail.
This second task is more difficult because we have to separate two representations
from the specific compiler. We are representing forms as data structures; and we
are also dealing with the representation of a specific machine.
The task %2is%* worth persuing since we wish to write different compilers
for the same machine and also  a single compiler capable of easy transportation to
other machines. 

The input to
%3compile%* is the sexpr representation of a LISP function; the output is a list
which represents a sequence of machine instructions.  Assume that we have
LISP running on %aBrand X%* machine, and we have just written the LISP function,
%3compile%*, which produces code for %aBrand X%* machine. Then:
.GROUP

	1.  Write the Sexpression form of %3compile%*.

	2.  Read in this translation, defining %3compile%*.

	3.  Ask LISP to evaluate:  %3compile[COMPILE]%*.
.APART

Now since %3compile%* is a function of one argument which purports to compile code
for %aBrand X%* machine, translating the sexpression representation of its argument
to machine code, the output of step 3. should be a list of instructions 
representing the translation of the function %3compile%*.  That is, step 3. compiles
the compiler!!

A technique called %2⊗→bootstrapping↔←%* is closely related to the process described
above. Assume that we have LISP and its compiler running on %aBrand X%*, and
we wish to implement LISP (quickly) on %aBrand Y%*. If we have been careful in
our encoding of the %3compile%* function then %2most%* of %3compile%* is
machine independent, that is it deals mostly with the structure of LISP and
only in a few places deals with the structure of the particular machine.  As we
will see this is not too  great an assumption to make.
Notice that this is one of our admonitions:  encode algorithms in a 
representation-independent style and include representation-dependent
routines to interface with the program. To change representations, simply
requires changing  those simpler subfunctions. Here the repesentations are
machines and the algorithm is a compiling function for LISP.

Let us call those parts of the compiler
which deal with the machine, the ⊗→code generators↔←. Now if we understand the
machine organization of brands %aX%* and %aY%* then for any instruction on %aBrand X%*
we should be able to give a (sequence of) instructions having the equivalent
effect on %aBrand Y%*. In this manner we can change the code generators in %3compile%*
to generate code to run on %aBrand Y%*. We thus would have a %3compile%* function,
call it %3compile*%*, running on %aX%* and producing code for %aY%*.
Now if we take the Sexpr forms of %3eval, apply, read, print, compile,...%* etc.
and pass these functions through %3compile*%*, we will get a large segment
of a LISP system which will run on %aY%*. Certain primitives will have to be
supplied to run this code on %aY%*, but a very large part of LISP can be
bootstrapped from %aX%* to %aY%*.

Obviously, before we can use %2a%* compiled version of the compiler (or
in fact, before we can use any compiled code) we must have some means of
translating the list output of the compile function into real instructions in the
machine.  So if the version of LISP which we are implementing is to have a
compiler we must allocate an area of memory which can receive the compiled code.
This area is usually called %2⊗→Binary Program Space↔←%* (BPS).  The translation program
which takes the list of output from the compiler and converts it into actual
machine instructions for ⊗→BPS↔← is called an assembler. Before discussing ⊗→assemblers↔←
we will  examine a sequence of simple compilers corresponding to the LISP
subsets evaluated by ⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔← and finally an ⊗→%3eval%*↔← which
handles local variables.

.SS(Compilers for subsets of LISP)

We will examine  compilers for increasingly complex subsets of LISP
beginning with functions, composition and constant arguments and
ending with a more realistic compiler for a reasonable subset of
pure LISP. Though each subset is a simple extension of its predecessor,
each subset introduces a new problem to be solved by the compiling algorithm.
If the corresponding evaluator (⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔← and the most
simple ⊗→%3eval%*↔←) is well understood, then the necessary steps
to be added to the compiler are easy to make.

First we need to begin describing a list of conventions which will remain in
effect throughout this sequence of compilers.  The code which is generated
must be compatible with that which incorporates the basic LISP machine.
In particular the calling conventions for function invocation must be
maintained.  

.GROUP
.BEGIN CENTERIT;SELECT 2;
←Compiling Conventions
.END

.BEGIN INDENT 0,10,10;
%21.%* A function of %3n%* arguments expects its arguments to
be presented in AC1 through ACn. Also the execution of any code which is
computing arguments to a function call must store temporary results on the stack,
%3P%*. 
Thus the execution sequence of code for computing %3n%* arguments
should be: 
.END

.BEGIN centerit;
←compute value of argument%41%*, push onto stack, %3P%*.

←..........
←compute value of argument%4n%*, push onto stack, %3P%*.
.END
.APART

.BEGIN INDENT 10,10,10;GROUP
So after this computation the values of the arguments are stored
V%4n%*, V%4n-1%*, ..., V%41%*, from top to bottom in %3P%*.
Thus to complete the function invocation, we need only pop the arguments
into the AC's in the obvious order. 
.END

.BEGIN INDENT 0,10,10;GROUP
%22.%* When the function completes evaluation it is to place that value
in AC1. Nothing is said about the contents any of the other AC's.

.APART
.GROUP
%23.%* This brings us to the one of the most important conventions
for %2any%* compiler.
We must always be sure that the integrity of the stack is maintained. Whatever
we push onto the stack within the body of a function %6must%* be
popped off before we exit from the function body. That is, the state of the stack
must be transparent to any computations which occur within the function.
.APART
.GROUP


%24.%* Finally some details of the output from the compilers: the output
is to be a list of instructions, sequenced in the order which we
would expect to execute them.  Each instruction is a list: an operation
followed by as many elements as are required by that operation.
The format for calling an %3n%*-ary function, %3fn%*, is:
.BEGIN CENTERIT;SELECT 3;
(CALL n (E fn))
.END
The %3E%*-construct will be explained in {yonss(P7)}.
And instead of refering to AC1, AC2, ... ACn we will simply
use the numbers, %31, 2, ...,n%* in the instructions.
.APART
.END

The first compiler will handle (the S-expression translation of) that
subset of LISP forms defined by the following BNF equations:

.BEGIN TABIT1(11);GROUP

<form>\::= <constant> | <function>[<arg>; ...; <arg>] | <function> [ ]

<arg>\::= <form>

<constant>\::= <sexpr>

<function>\::= <identifier>

.END

This LISP subset corresponds closely with ⊗→%3tgmoaf%*↔←, handling only
function names, composition, and constant arguments.
Our previous compiling conventions will keep us in good stead. 
We might look more closely at the compilation of constants. A S-expression
constant will be seen as  %3(QUOTE ...)%* by the compiler. Since the
convention %22%* says the value of an expression is to be found in AC1, the
instruction which we wish to generate is:


.BEGIN TABIT1(30);SELECT 3;GROUP

\(MOVEI 1 (QUOTE ...))
.END
It should now be clear how to proceed with the first expression-compiler.
In the interest of readability and generality, we will write the functions using
constructors, selectors, and recognizers and supply the necessary
bridge to our specific representation by simple sub-functions.

.BEGIN SELECT 3;TABIT2(11,18);
.GROUP

compexp <=\λ[[exp]\[isconst[exp] → list[mkconst[exp]];
\\ T → compapply[fun[exp];complis[args[exp]];length[args[exp]]]] ]
.APART
.GROUP

complis <=\λ[[u]\[null[u] → NIL;
\\ T → append[compexp[mkcall[first[u]]]; list[mkpush[]]; complis[rest[u]]]] ]
.APART

.P82:
compapply <= λ[[fn;args;n]append[args;loadac[n];list[fn]]]

.APART
.P56:
.GROUP
loadac <=\λ[n]\[zerop[n] → NIL;
\\ T → concat[mkpop[n];loadac[n-1]] ]
.APART
.SELECT 1;
.BEGIN TABIT3(5,28,48);SELECT 2;GROUP;
\Recognizer\Selector\Constructor
.SELECT 3;

isconst 
 <= λ[[x]eq[first[x] QUOTE]\\mkconst <= λ[[x] list[MOVEI; 1; x]]
\\fun <= λ[[x]first[x]\mkpush <= λ[[](PUSH P 1)]
\\args <= λ[[x]rest[x]\mkcall <= λ[[n;fn]list[CALL; list[E;fn]] ]
\\\mkpop <= λ[[n] list[POP;P;n]]

.END


.END
Note that the functions %3car, cdr,%1 and %3cons%* do not appear
anywhere in this compiler.

The code generated by this compiler is clearly inefficient, but that
is not the point. We wish to establish an intuitive and,
hopefully, correct compiler, then later 
worry about efficiency. Worrying about efficiency before establishing
correctness is the ultimate folly.

Let's look at the possibility of compiling a larger set of LISP constructs.
The innovation which occurred in %3tgmoafr%* was the apprearance of conditional 
expressions. To describe conditional expressions, the above BNF equations
were augmented by:

←<form> ::= [<form> → <form> ; ... <form> → <form>]

Certainly the addition of conditional expressions will mean an extra 
piece of code in %3compexp%*
to recognize %3COND%* and a new function (analogous to %3evcond%* in %3tgmoafr%*)
to generate the code for the %3COND%*-body.
In fact, the only difference between %3evcond%* and its counterpart in 
%3compexp%*, which we shall call %3comcond%*, is that %3comcond%* generates
code which when executed will produce the same value as the value
produced by %3evcond%* operating on the given S-expression.

The effect of %3comcond%* on a conditional of the form:

.P103:
(%2*1*%*)←%3(COND (%1p%41%* e%41%3) ... (%1p%4n%* e%4n%3))%1 should be clear.

First generate code for p%41%*; generate a test for truth, going to the
code for e%41%* if true, and going to the code for p%42%* if not true.
The code for e%41%* must be followed by an exit from the code for the
conditional, and we should probably generate an error condition to be executed
in the case that p%4n%* is false.

.BEGIN NOFILL;GROUP;
Pictorially, we might represent the code as:

		<code for p%41%*>




 
	        T			NIL


	<code for e%41%*>		<code for p%42%*>


				     T			NIL


		<code for e%42%*>		        <code for p%43%*>

		     ....          			....



					     T			NIL
	

				<code for e%4n%*>		<code for error>






.END

This requires the establishment of more conventions for our compiler.

.GROUP
.BEGIN CENTERIT;SELECT 2;
←More Compiling Conventions
.END

.BEGIN INDENT 0,10,10;
In particular we must be able to test for %3T%* (or %3NIL%*). We will
define %3NIL%* to be the zero of the machine, and we will test for
%3NIL%* using the %3JUMPE%* instruction
(see {yonss(P33)}). We will test AC1 since the value returned
by p%4i%* will be found there.
.END
.APART

.BEGIN INDENT 0,10,10;
Next, since our code is to be a %2sequence%* of instructions,
we must linearize this graph representation of the generated code.
We will utilize a standard trick using "labels" and "go to"s.
The "go to" instruction, %3JRST%*, is described in {yonss(P33)}.
.END
Thus  for the compilation of *1*, {yon(P103)},
we will generate:

.BEGIN TABIT2(30,35);GROUP

\%3(\%1<code for p%41%*>%3
\\(JUMPE 1, L1)%1
\\<code for e%41%*>%3
\\(JRST L)
\L1\%1<code for p%42%*>%3
\\(JUMPE 1, L2)
\        ...
\Ln-1\%1<code for p%4n%*>%3
\\(JUMPE 1, Ln)%1
\\<code for e%4n%*>%3
\\(JRST L)
\Ln\(JRST ERROR)
\L\   )
.END
%1

To generate such code we need to be able to generate the labels, %3Li%*.
The generated labels should be atomic and should be distinct. LISP has
a function, ⊗→%3gensym%*↔←, which can be used for this task. %3gensym%*
is a function of no arguments  whose value is a generated symbol or atom
which is guaranteed to be distinct from any atom generated in any of
the usual manners. In many versions of LISP, gensyms are of the form
%3Gn%* where %3n%* is a four digit number beginning at %30000%*. Thus
the first call; %3gensym[ ]%* would give %3G0000%*; succeeding calls would give
%3G0001, G0002%*, etc.
First, gensyms are not placed in the object list since
they are usually used only for their unique name. Second, if you explicitly
place them in the symbol table they are obviously distinct from each other and
will be distinct from all other atoms unless, of course, you read in such an atom.

What we must now do is write a LISP function, call it %3comcond%*,
which will gererate such such code. %3comcond%* will 
be recursive. We must thus determine what code gets generated on each
recursion and what code gets generated at the termination case.
Looking at the example we see that the block 

.BEGIN CENTERIT;
←%3(%1<code for p%4i%*> (JUMPE 1 Li), ... %3(JRST L) Li)%1
.END
is the natural segment for each recursion and that:
.BEGIN CENTERIT;
←%3((JRST ERROR) L)%1
.END
is a natural for the termination case. We notice too that within each
block we need a "local" label, and that within each block,
including the terminal case, we refer to the
label %3L%* which is "global" to the whole conditional.

Without too much difficulty we can now add the recognizer for
%3COND%* to %3compexp%* and construct %3comcond%*.

.BEGIN SELECT 3; TABIT3(27,34,45);TURN OFF"←";GROUP;
.FILL

%1Add%* iscont[exp] → comcond[args[exp];gensym[ ]]; %1to%3 compexp 
%1where:
.NOFILL
%3

comcond <= λ[[u;glob][null[u] → list[mkerror[ ];glob];
\\T →append[comclause[first[u]; gensym[];glob];
\\\comcond[rest[u]; glob] ]

comclause <=λ[[p;loc;glob]append[compexp[ante[p]];
\\\list[mkjumpe[loc]];
\\\compexp[conseq[p]];
\\\list[mkjrst[glob]];
\\\ loc]]

.END

.BEGIN CENTERIT;SELECT 3;GROUP;
←iscond <= λ[[x]eq[first[x]; COND]]

←args <= λ[[x] rest[x]]
←ante <= λ[[c] first[c]]
←conseq <= λ[[c] second[c]]

←mkerror <= λ[[](JRST ERROR)]
←mkjumpe <= λ[[l]list[JUMPE;1;l]]
←mkjrst <= λ[[l]list[JRST;l]]

.END
.SELECT 1;

.BEGIN GROUP;TABIT2(30,38);
The partially exposed recursive structure of %3comcond%* would show:

←%3comcond[((%1p%41%*  e%41%3) ...(%1p%4n%*  e%4n%3));G0000]=

\%3(%1\<code for p%41%*>%3
\\(JUMPE AC1, G0001)%1
\\<code for e%41%*>%3
\\(JRST G0000)
\ G0001\comcond[((%1p%42%* e%42%3) ...(%1p%4n%* e%4n%3))G0000])

.END

.SELECT 1;
Before attacking the major problem of our compilers, the handling of
variables, we shall describe how the list representing the output
of the compiler is turned into code which runs on the machine.


.SS(Problems)

.BEGIN TABIT1(16);
I. Evaluate: 
%3compexp[(COND(\(EQ (QUOTE A)(QUOTE B))(QUOTE C))
\((NULL (QUOTE A))(QUOTE FOO))))]


****more, more ****

.END
.SS(One-pass Assemblers,assemblers,P7:)

Define %3compile%* as:

%3compile <= λ[[fn;vars;exp]append[list[list[LAP;fn;SUBR]];compexp[exp]]]%*.

Consider the output from %3compile%* for the function:
.GROUP

%3←j [] <= f[g[A];h[B]]%* 

or more precisely, for the evaluation of

←%3compile[J;();(F(G(QUOTE A))(H(QUOTE B)))]%*.
.APART
.GROUP

It would be a list of the form:
.BEGIN TABIT2(10,45);SELECT 3;
\((LAP J SUBR)\%1; says %3j%* is a function%3
\(MOVEI AC1(QUOTE A))\%1; make argument for call on %3g
\(CALL 1 (E G))\%1; call the function%3
\(PUSH P AC1)\%1; save the value%3
\(MOVE AC1(QUOTE B))
\(CALL 1 (E H))
\(PUSH P AC1)
\(POP P AC2)
\(POP P AC1)
\(CALL 2(E F))
\(POPJ P)
\)

.END
.APART
%1

As you know the machine representation of these instructions are encodings of
specific fields of specific machine locations with specific numbers.
For example, the operation %3PUSH%* is represented as a certain number,
called its %2⊗→operation code↔←%* or %2op code%*, and which will
occupy a certain area of a machine word so that the CPU can interpret
it as an instruction to push something onto a stack.  Other fields in the
instruction are to be interpreted as references to stacks, to memory locations,
to accumulators, constants or external references to other routines.
The purpose of an assembler is to translate these mnemonic instructions
into raw seething machine instructions.

Essentially all that the assembler need do is search ⊗→symbol tables↔←
for the opcodes, for subroutine names, for accumulator and stack names,
and store the resulting values in the appropriate machine locations.

Things are slightly more complex: we must also %6add%* information to
the tables as we proceed.  For example, as we assemble the code for a new
routine we must add its name and starting address to the current symbol
table.  The phrase, %3 (LAP %1name%3 SUBR)%* does this.

We must exercise a bit of care in handling  %3QUOTE%*d expressions.
Assembling a construct like %3(MOVEI AC1 (QUOTE (A B C)))%* should
have the effect of constructing the list %3(A B C)%* in free space
and placing an instruction in memory to load the address of this list into AC1.
What we must notice is that this list %3(A B C)%* is subject to
garbage collection and, if left unprotected, could be destroyed.
There are a couple of solutions. Perhaps the garbage collector could
look through compiled code for any references to free- or full-word- space;
or we could make a list of all of these constants and let the garbage
collector mark the list.

Much more problematic is the handling of labels and references to labels.
This case arose in the compiler for ⊗→%3tgmoafr%*↔← when we introduced
conditional expressions. Recall that the code for a ⊗→conditional expression↔←
of the form:

.GROUP
%3←(COND (%1p%41%* e%41%3) ... (%1p%4n%* e%4n%3)) %1 was:

.BEGIN TABIT1(35);

\    <code for p%41%*>
\    %3(JUMPE AC1 L1)%1
\    <code for e%41%*>
\    %3(JRST L)
\L1  %1<code for p%42%*>
\    %3(JUMPE AC1 L2)%1

\          ....
\    <code for e%4n%*>

\%3L        ....

.END
.APART
%1

The symbols, %3 L, L1,%* and %3L2%* in this example are labels.  Notice
that if we were to consider writing the assembler in LISP,
we could distinguish occurrences of labels from instructions using
the predicate, %3atom%*.

It perhaps is time to start thinking about writing such an assembler.
One of the arguments to the function should be the list representation
of the program.  One of its arguments should also describe where (in ⊗→BPS↔←)
we wish the assembled code to be located. We should also have access
to an initial symbol table, describing the opcodes and pre-defined
symbol names. Let's call the function, ⊗→%3assemble%*↔←.
%3assemble%* then can go %3cdr%*-ing down the program list, looking up
definitions and manufacturing the numerical equivalent of each 
instruction, then depositing that number in the appropriate machine
location.  If %3assemble%* comes across a label definition, it should
add information to a symbol table, reflecting that the label has
been seen and that it is associated with a specific location in memory.
Then future references to that label can be translated to references
to the associated machine location.  The only problem is that references
to labels may occur %6before%* we have come across the label definition 
in the program.  Such references are called %2⊗→forward reference↔←s%*.

.GROUP
For example, consider the following nonsense program:

.BEGIN TABIT1(35);SELECT 3;

\(  (LAP FOO SUBR)
\ X (JRST X)
\    (JRST Y)
\ Y (JRST X) )

.END
.APART
The reference to %3Y%* is a forward reference; neither of the references to %3X%*
is forward since %3X%* is defined before being referenced.


There are two solutions to the ⊗→forward reference↔← problem:

.BEGIN INDENT 0,5;
%21.%*  Make two passes through the input program.  The first pass
decides where the labels will be assigned in memory.  That is, this
pass builds a symbol table of labels and locations.  Then a second pass
uses this symbol table to actually assemble the code into the machine.
This works, but is not particularly elegant. It is expensive to read through
the input twice, particularly when we can do better.

%22.%*  The other solution is to make one clever pass through the input.
Whenever we come across a forward reference we add information to the symbol table
telling that a forward reference has occurred at this location.  We assemble
as much of the instruction as we can.  When a label definition %6does%*
occur we check the table to see if there are any forward references pending
on that label.  It there are such we  %6⊗→fix-up↔←%* those instructions to
reference the location now assigned to the label.

.END
A speed-up by a factor ranging from two to five is not uncommon for a good
one pass assembler.
There are some restrictions which are imposed on one-pass assemblers, but
particularly for assembling compiled code, one-pass assemblers are quite sufficient.


There are at least two ways to handle fixups and forward references.  If
the fixups are of a particularly simple nature, say only requiring fixups
to the address-part of a word, then we may link those pending forward references
together, chaining them on their address parts. Thus:

.GROUP SKIP 20;

Another solution, which is potentially more general (that is, it could
handle left-half, right-half or partial-word fixups) is to store the information
about each fixup in the symbol table under each label. Thus:

.GROUP SKIP 20;

Both methods are useful. Both have been used extensively in assemblers and
compilers.

To write the function, %3assemble%*, we will need two functions:
.BEGIN INDENT 0,10;

%21.%*  ⊗→%3deposit%*↔←%3[x;y]%*: %3x%* represents a machine address; %3y%* is a list,
representing the instruction to be deposited. %3y%* should be a list of
four elements: %3(%*opcode ,accumulator number, memory address, index register%3)%*
The value of %3deposit%* is the value of %3y%*.

%22.%*  ⊗→%3examine%*↔←%3[x]%*: %3x%* represents a machine address.  The value
of %3examine%* is the contents of location %3x%* in the form of a list as
specified above.
.END
.BEGIN TURN ON "#";

%3assemble%* needs to recognize that there are different instruction formats.
That is, some instructions use an opcode and a memory reference: %3(JRST#L)%*;
some use an opcode, accumulator, and an address: %3(PUSH#P#AC1)%*; and some
vary: %3(MOVE#AC1#(QUOTE A))%* and %3(MOVE#AC1#-1#P)%*. %3assemble%* also
has to have an initial symbol table of opcodes, accumulator numbers, and
stack numbers:
.END

.BEGIN TABIT2(35,45);GROUP;
%2\symbol\value%3
\MOVE\200
\MOVEI\201
\SUB\274
\JRST\254
\JUMPE\322
\JUMPN\326
\PUSH\261
\POP\262
\POPJ\263
\CALL\034
\AC1-n\1 - n
\P\14
.END

.GROUP
So for example:

←%3assemble[((LAP FOO SUBR) X (JRST X) (JRST Y) Y (JRST X));104]%* should

have the final effect:
.BEGIN TABIT1(43);

--→%7α~~~]%*--→%bPNAME%7[~~[~~]%1--→%7[~~]~~]%*--→%bSUBR%7[~~[~~]%1...
\|
\|
\|
\|        op   ac  ind  add
\→104  254    0   0   104
\  105  254    0   0   106
\  106  254    0   0   104

.END
.APART
.SS(A compiler for simple %3eval%*,compiler,P32:)

Consider the function defined by:

←%3j[x;y] <= f[g[x];h[y]]%*

We claim that the following code suffices for function:
.BEGIN TABIT2(10,45);SELECT 3;

\(LAP J SUBR)\%1; says %3j%* is a function%3
\(PUSH P AC1)\%1; save the input args%3
\(PUSH P AC2)
\(MOVE AC1 -1 P)\%1; get %3x
\(CALL 1 (E G))\%1; call the function named %3g
\(PUSH P AC1)\%1; save the value%3
\(MOVE AC1 -1 P)\%1; get y%3
\(CALL 1(E H))\%1; call %3h
\(PUSH P AC1)
\(POP P AC2)\%1; restore the arguments in%3
\(POP P AC1)\%1;   preparation for%3
\(CALL 2(E F))\%1;   calling %3f
\(SUB P(C 0 0 2 2))\; %1sync the stack; remove the saved args%3
\(POPJ P)\; %1exit.%3 AC1%* still has the value from %3f.
\   )
.END
Here is a picture of the execution of the code:

.BEGIN TURN ON "\";NOFILL;TABS 5,10,25,30,45,50,65,70;SELECT 3;GROUP;

AC1: x ; AC2: y

\|\| (PUSH P AC1)\\ (PUSH P AC2)\|  y\|(MOVE AC1 -1 P)\| y | (CALL 1 (E G))
\|\|    =>\|  x\|     =>\|   x\|        =>\| x |  =>


AC1: g[x] ; AC2: ?\\\AC1: y ; AC2: ?
\\ (PUSH P AC1)\|g[x]\|(MOVE AC1 -1 P)\|g[x]\| (CALL 1(E H))
\|  y\|      =>\|  y\|      =>\|  y\|   =>
\|  x\|\|  x\|\|  x\|


AC1: h[y] ; AC2: ?\\\AC2: h[y]\AC1: g[x] ; AC2: h[y]
\|g[x]\| (PUSH P AC1)\|h[y]\|   (POP P AC2)\|g[x]\|  (POP P AC1)\| y |(CALL 2 (E F))
\|  y\|       =>\|g[x]\|      =>\|  y\|      =>\| x |   =>
\|  x\|\|  y\|\|  x\|
\    \ \|  x\|


AC1: f[g[x];h[y]]
\|  y\|(SUB P (C 0 0 2 2))\\|\| (POPJ P)
\|  x\|           =>    \=>


.END

.BEGIN INDENT 0,10;TURN ON "#";
Notes: The %3(MOVE %*x -n%3 P)%1 instruction allows us to put a copy
of the contents of the n%8th%* element down the stack.  Notice too, that
the addressing in the code is relative to the top of the stack: %3(MOVE#AC1#-1#P)%*
gets us %3x%* in one instance and gets us %3y%* in another, because the top 
of the stack changes.
.END

Clearly we want a compiler to produce equivalent (albeit inefficient) code.
Once we understand how to do this it is relatively easy to improve on the efficiency.
We shall now extend %3compile%* to handle local variables.



.<<discussion of offset and vpl>>
.BEGIN TURN ON "#";
The major failing of the previous %3compile%* 
({yonss(P7)}) is its total lack of
interest  in variables. the handling of variables is  
a  non-trivial problem  which every
compiler-writer must  face.    You  have just seen an example  of  plausible  code
generation for simple LISP functions, in particular:

←%3j[x;y] = f[g[x];h[y]]%*

The crucial point is how to generate code which `knows'
where on  the stack we can find the value  of a (local) variable when
we execute that code.  What  we shall do is simulate the behavior  of
the runtime  stack while  we are  compiling the  code.   The compiler
cannot  know what the %2values%* of the  variables will be at runtime but
we claim that it should know %3where%* to find the values.  We will carry
this information through the  compiler in a manner reminiscent of the
`dotted-pair' symbol table of the first version of %3eval%* seen in {yonss(P6)}.  Instead  of
posting the current values  in the stack, the compiler  will post information about  the
positions  of the variables relative  to the top of  the stack at the
time we enter  the function.   The variable pair list, %3vpl%*, 
contains this information.  That is if we are
to compile  a function with λ-variables, %3[x;y;z]%* then %3vpl%* will begin
with:

←%3((X . 1), (Y . 2), (Z .3) ...%*

When we set up %3vpl%* 
we also set the %2⊗→offset↔←%*,  called %3off%*, to minus the number  of args added
to %3vpl%*, in this  case -3.  Now if we come across a reference say to
%3Y%* while compiling code, we  use %3cdr[assoc[Y;vpl]]%* to retrieve %32%*.   The
offset plus this retrieved value gives  us the relative position of %3Y%*
in  the stack.   I. e.,  -3 +  2 = -1.   Thus  to refer  to the stack
location of %3Y%*  we could  use %3(....-1#,#P)%*.  What  happens as we  add
elements to  the stack? (Or to  be more precise...what  happens as we
generate code which when  executed will add  elements to the  stack.)
Clearly we must modify the  offset.  If we add one  element, we would
set  %3off%* to -4.   Then to reference  %3Y%* now use  -4 + 2  = -2.   I. e. a
reference to %3Y%* is now of the form:

←%3( ......-2 P).%*

But that's right.  %3Y%*  is now further down in the run time  stack.  So
to  complete the analogy, the  `symbol table' is really  defined by %3off%*
plus the  current  %3vpl%*.

When will the compiler make modifications  to the top  of the stack?   First, we
push new  elements when we are compiling  the arguments to a function
call.  The function  in the new compiler which compiles the  argument
list is called %3complis%*:

.BEGIN SELECT 3; TABIT2(22,32);GROUP

complis <= λ[[u;off;vpl]\[null [u] → NIL
\ T → append\[compexp [first[u]; off; vpl];
\\ list[mkpush[]];
\\ complis [rest[u]; off-1; vpl]]]

mkpush <= λ[[](PUSH P 1)]
.END

Notice  that   %3complis%*  compiles  the   arguments  from  left   to  right,
interspursing  them with %3(PUSH#P#AC1)%* and recurring with  a new offset
reflecting the effect of the %3PUSH%*. Clearly this function is analogous to
%3evlis%*.

Here's a brief description of the parts of the new compiler. This compiler was 
adapted from one written
by John McCarthy in conjunction with a course at Stanford University
entitled %3Computing with Symbolic Expressions%*. The McCarthy compiler
was also the subject of study by Ralph London in his paper,
%3Correctness of Two Compilers for a LISP Subset%*.

.BEGIN INDENT 0,15;
%3compile[fn;vars;exp]: fn%* is  the name of  the function to be  compiled. %3vars%*  is the
list of lambda variables.  %3exp%* is the lambda-body.

%3prup[vars;n]:  vars%* is  a  lambda list, %3n%* is an  integer.   %3prup%*  builds  a
variable-pair list.

%3mkpushs[n;m]%*:  generates a sequence of %3PUSH%* instructions.

%3loadac[n;k]%*:  is a slight modification of the previous %3loadac%* ({yon(P56)}).
This version generates a series of %3(MOVE ACi ...)%* instructions followed
by a %3(SUB P (C 0 0 N N))%*, rather than a sequence of %3POP%*s.

%3compexp[exp;off;vpl]%*: this  function does most  of the work.   It is  analogous to
%3eval%*.
It generates the code for constants,
for a reference to a variable,
and for conditional expressions.
It is  used for compiling code for  a call on a function,
compiling the argument list with %3complis%*, which will
leave the arguments in the stack; %3loadac%* loads the  appropriate %3AC%*'s.
and then we generate the %3SUB%*  to sync the stack, and finally generate
a call on the function.

%3comcond[u;l;off;vpl]%*: this compiles the body of conditional expressions.  %3u%* is the
p%4i%* - e%4i%*  list; %3l%* will be  bound to a  generated symbol name
; %3off%* and %3vpl%* are as always.
.END
.END

Fortified by the previous %3compile%* functions and this introduction
the new %3compile%* should be clear.

.BEGIN NOFILL ;select 3;TURN OFF "←";

.GROUP
compile <= λ[[fn;vars;exp]
             λ[[n]
               append[list[mkprolog[fn]];
                      mksaveacs[n;1];
                      compexp[exp; -n; prup[vars;1]]
                      mksync[n];
                      mkexit[]]
              [length[vars]] ]


.APART
.GROUP
prup <= λ[[vars;n] 
	  [null [vars] → NIL;
	   T → concat[cons [first[vars]; n]; prup [rest[vars];n+1]]]

mkpush <= λ[[n;m]
 	    [lessp [n;m] → NIL
	     T → concat[list [mksaveac[m]]; mkpush [n;m+1]]]


.APART
.GROUP
loadac <= λ[[n;k]
            [greaterp[n;0] → NIL;
             T → concat[list[mkloadac[k;n];loadac[n+1;k+1]]]]

.APART
.GROUP

compexp 
 <= 
   λ[[exp;off;vpl]
     [isconst[exp]] → list [mkconst[exp]];
      isvar[exp] → list [mkvar[exp;off;vpl]];
      iscond[exp] → comcond[args[exp];gensym [ ];off; vpl];
      isfun+args[exp] → compapply[mkcall[fn[exp];
	                          length[argsf[exp]];
	                          complis[argsf[exp];off;vpl]];


compapply <=λ[[fn;n;arglist]
              append [arglist
                      loadac [1-n;1];
                      sync[n];
                      list[fn]] ]

.APART

.GROUP

comcond <= λ[[u;glob;off;vpl]
             [null[u] → list[mkerror[ ];glob];
              T →append[comclause[first[u]; gensym[];glob;off;vpl];
                        comcond[rest[u]; glob;off;vpl] ]

comclause <=λ[[p;loc;glob;off;vpl]
               append[compexp[ante[p];off;vpl];
                      list[mkjumpe[loc]];
                      compexp[conseq[p];off;vpl];
                      list[mkjrst[glob]];
                      loc]]

.APART
.END


Here is a partial sketch of %3compile%* operating on %3j <= λ[[x;y]f[g[x];h[y]]].%*

.BEGIN SELECT 3;TABIT2(10,17);CENTERIT;

.GROUP
compile[J;(X Y);(F (G X) (H Y))] %1gives:%*
\append\[((LAP J SUBR));      (**1**)
\\ mkpush[2;1];
\\ compexp[(F (G X)(H Y));-2;prup[(X Y);1]];
\\ (SUB P (C 0 0 2 2))
\\ ((POPJ P ))];%1
.APART

.GROUP
where:
←%3mkpush[2;1]%* gives %3((PUSH P AC2)(PUSH P AC1)),%* and
←%3prup[(X Y);1]%* gives %3((X . 1)(Y . 2))%*.
.APART
.GROUP

%3compexp[(F (G X)(H Y));-2;((X . 1)(Y . 2))]%* results in:
%3
\append\[complis[((G X)(H Y));-2;((X . 1)(Y . 2))];
\\loadac[-1;1];
\\((SUB P (C 0 0 2 2)));
\\((CALL 2(E F)))]

%1and %3loadac[-1;1]%* evaluates to: %3((MOVE AC1 -1 P)(MOVE AC2 0 P))%*.
.APART
.GROUP

Thus (**1) above, is of the form:

%3
\\((LAP J SUBR)
\\ (PUSH P AC2)
\\ (PUSH P AC1)
\\ complis[((G X)(H Y));-2;((X . 1)(Y . 2))]
\\ (MOVE AC1 -1 P)
\\ (MOVE AC2 0 P)
\\ (CALL 2 ( E F))
\\ (SUB P (C 0 0 2 2))
\\ (POPJ P)
\\)

.APART
.FILL;
%3complis%1 is interesting since it actually uses the %3vpl%* we have been
carrying along. It gives rise to:
.NOFILL
.GROUP

%3
\append\[compexp[(G X);-2;((X . 1)(Y . 2))];
\\ ((PUSH P AC1));
\\ complis[((H Y));-3;((X . 1)(Y . 2))]]

.APART
.GROUP
%1and the %3compexp%* computation involves, in part:

%3
\append[complis[(X);-2;((X . 1)(Y . 2))];
\\ ((MOVE AC1 0 P));
\\ ((SUB P (C 0 0 1 1));
\\ ((CALL 1 (E G))]

.APART
.GROUP
%1Finally this %3complis%* generates the long awaited variable reference using:

%3compexp[X;-2;((X . 1)(Y . 2))] %1giving,
\\%3((MOVE AC1 -1 P))%*.
.APART

Notice that the offset is different within the call:

←%3 complis[((H Y));-3;((X . 1)(Y . 2))].%1

Finally, you should convince yourself that the code, though inefficient, is correct.
.END
.APART

←%2Problems%*

I  Simple problems

1. Evaluate %3compile[%1  h.s.] 
etc.

2. Complete the code generation for the above example.


.SS(A project: Efficient compilation)
.P35:
Certainly the most striking feature of the last %3compile%* is its
outrageous inefficiency. Examination of the compilation of the
most simple function suggests many possible improvements. 

We should first clarify what we mean the efficiency in this context.
We might mean minimal compile-time. In this case we would want a very
simple compiler; this usually means that the complied code is extraordinarily
bad. Such compilers might suffice for debugging runs or student projects.
More likely, efficiency compilation is taken to mean production of
code which we could expect from a reasonably bright machine-language programmer.
It should run reasonably fast, not have obviously redundant instructions, and
not take too much space in the machine. It is this second interpretation
of efficiency which we shall use.

A major inefficiency occurs in saving and restoring  quantities on the
stack. For example, the sequence %3(PUSH P AC1)(POP P AC1)%* serves no
useful purpose. This is a symptom of a more serious disease. 
The compiler does not remember what will be in the ACs at run-time. Since the
arguments to a function call must be passed through the special AC registers
and since it is expensive to save and restore these registers, we should
make a concerted effort to remember what quantities are in which ACs and
not reload them unnecessarily. This process will certainly  make the
compiler more complicated and that will mean longer compilation time but
compilation usually occurs less frequently than execution of compiled
code.

Here are some possibilities.

****ADD 'EM*****


diatribe about go to
A major obstacle to  this kind of optimization is the unrestricted use
of labels and gotos. Consider a piece of compiler code which has a label attached
to it.
 Before we can be assured of the integrity of an AC
we must ascertain that every possible path to that label maintains that AC.
This is a very difficult task. The label and goto structure required by
%3compile%* is quite simple. However if we wished to build a compiler for
LISP wirh %3prog%*s we would have to confront this problem.
.SS(A project: One-pass assembler)
.P10:

III A one-pass ⊗→assembler↔←.

Write a one-pass assembler for the code generated by the %3compile%* function
of this section. You should be aware of the following points:

.BEGIN INDENT 0,5;

%2a.%*  %3QUOTE%*d expressions must be protected from garbage collection. The
simplest way to accomplish this it to  save them on a list, say %3QLIST%*.

%2b.%*  Use the operation codes of {yonss(P7)})
****MORE ON INST FORMAT.***

%2c.%*  Design a simple fixup scheme.  Notice that %3compile%*'s code will
require address fixups at most.

%2d.%*  Recall that the format of the code is a list. The items in the list are
either atomic -- representing labels --, or lists -- representing instructions--.
The instructions have varying format.  Perhaps a ⊗→table-driven↔← scheme can be used.

%2e.%*  Some attempt should be made to generate error messages.

%2f.%*  Many of the constants, %3(C 0 0 %*n n%3)%*, occur frequently; these constants
are only referenced, never changed by execution of the compiled code.  Write
your assembler to maintain only one copy of each. The constants should be stored
directly after the compiled code.  

%2f%*.  Try to be equally clever about storing %3QUOTE%*d expressions.
.P36:

IV ***problem on internal lambdas

.END
.SS(A project: Syntax directed compilation,syntax-directed,P4:)

compilation for sae

BNF for mexprs

syntax directed compiler

scanner parser
.SS(A deep-binding compiler,deep binding)

**sketch of tgmoaf deep binder conventions

**project: do tgmoafr and eval d-b compr

**hints about globals and name stack

**project: do eval compiler using name-value stack for locals.

.SS(Compilation and global variables,global variables,P5:)
.BEGIN TURN ON "#";

**** expand greatly***

The models of compilation which we have sketched so far store
their  λ-variables  in  the  stack,  %3P%*.   References  to  those
variables in  the body of  the λ-expression are  made to  those
stack entries.  This scheme suffices  only for lambda or %3prog%* (local)
variables.   We have said that λ-expressions may refer to global
or free  variables.  The  lookup mechanism  simply finds the  current
binding of  that global in the  symbol table.  Notice that  this is a
different strategy than the global lookup of Algol.  In Algol, when a
procedure refers to a global we take the  binding which was current at
the  point of definition  of the procedure.   The  LISP mechanism is
called %2⊗→dynamic binding↔←%*.   It corresponds to physically  substituting
the body  of the definition  of the  function wherever it  is called.
Its model  of implementation is simpler than that required for Algol.
We don't need the ⊗→static chain↔← for this case of  global lookup.  Thus
interpreted expressions encounter  no problems when faced with global
variables.  There are potential  difficulties for compiled code.   If
all we store  of the stack in a  compiled function is the value  of a
variable  then another  program which  expects  to use  that variable
globally will have no way of  finding that stored value.  One  scheme
is to store  pairs on the stack:  name and value (LISP#1.85) then we
can  search the stack for  the latest binding.  It  works.  With this
scheme we  can dispense  with the  ⊗→%3VALUE%*-cell↔←.  Actually this  scheme
isn't  all that bad.   The  compiler can still  `know' where  all the
local variables  are on  the stack  and  can be  a bit  clever  about
searching for  the  globals.   Notice this  is the  old symbol  table
mechanism  (⊗→a-list↔←)  again.   LISP#1.85 was  designed  for  a paging
machine (XDS#940) and a  few unpleasant features crept in because  of
this.   (However, on  the positive  side some  nice coding  tricks to
minimize page references also arose.)

The solution proposed  by the LISP 1.6  implementation called
%2⊗→shallow  binding↔←%*, allows  the compiled  code  to directly  access the
%3VALUE%*-cell in the symbol table.  There is an artifact in the compiler
(and  assembler) called  %3SPECIAL%*.    Variables which  are  to be  used
globally  are declared ⊗→special variables↔←.   When a  variable, say %3x%*, is
declared special the compiler will emit a reference to %3x%* as 
%3(MOVE#AC%4i%*#(SPECIAL#X))%1
or %3(MOVEM#AC%4i%*#(SPECIAL#X))%1 rather than the corresponding
reference to a location on the  stack.  When the LISP assembler  sees
the indicator %3SPECIAL%*, it will search the  symbol table for the %3VALUE%*-cell
 of %3X%* and  assemble a reference to that cell.  Since the location
of the value  cell does %2not%*  change, we can  always find the  current
binding posted  in the  %3cdr%*-part of  that cell.  Notice too  that any
interpreted function  can also sample the %3VALUE%*-cell so global values
can be passed between compiled and interpreted functions.  The values
of local variables are still posted on the stack.
This then is the reason for depressing the actual value one level.

****pic
.GROUP SKIP 6;

We have not yet  discussed the mechanism which will  allow us
to  pass back and  forth between compiled  and interpreted functions.
To complete that discussion we must introduce the %aSM%* instruction for
calling a function:

.BEGIN TABIT1(19);TURN OFF"←";

PUSHJ P fn\%3C%*(P) ← %aPC%*
\P ← %3C%*(P) + 1
\%aPC%* ← fn. This is called the "push-jump" instruction. Exit with POPJ.
.END

First we 
require that the  calling conventions for both kinds  of functions be
isomorphic.

When an interpreted function calls  a compiled (or primitive)
function, %3eval%* will  look for the indicator,  %3SUBR%*; then retrieve the
machine address of the code and enter via a %3PUSHJ%*. That code should exit
(back to %3eval%*) via a %3POPJ%*, after assuring that the stack, P, has
been restored to the state at the time of entry.

Compiled functions  call other functions  via the %3(CALL#%1n%*#(E#%1fn%3))%1
artifact.  The %3CALL%*  opcode is actually  an illegal instruction
which is trapped to a submonitor inside %3eval%*.  This  submonitor looks
down the  property list  of the  atom, fn,  for a  function indicator
(%3SUBR, EXPR%*,  etc).  The  function is  called  and  control  is then
returned to  the address  immediately following  the %3CALL%*.   In  many
cases this %3CALL%* can be replaced  by a %3(PUSHJ#P#%1fn%3)%*, but not always as
we will see next.
.END
.SS(Functional Arguments,,P3:)

***more h.s.***

***farting with funarg***

funarg b-w. and weizen

**add deep-binding compiler**

what does this say about  the CALL mechanism in the compiler?  It says that
the calling  mechanism  for  a  functional argument  must  always  be
trapped the submonitor and decoded.  We cannot replace that call with
a  PUSHJ to some machine  language code for  the function because the
function referred to can change.  We actually use  a CALLF instruction
to designate a call on a functional argument.
	
.SS(Macros and special forms,macros,P57:)
Recall our discussion of macros and ⊗→special forms↔← in {yonss(P18)}.
We wish  to extend our compiler to handle such definitions.

Consider the example of defining %3plus%* of an indefinite number of arguments
given on {yon(P58)}.
Notice first that difference in efficiency between the interpreted macro ({yon(P58)})
and the interpreted special form ({yon(P59)}) is very slight.  Both require calls on %3eval%*.
One requires explicit user calls on the evaluator; one does it
within the evaluator.
In the presence of a compiler we can frequently make execution of macros
more efficient than their special form counterpart. This is the case with %3plus%*.
When %3plus%* is called we know the number of arguments, and can simply
expand the macro to a nest of calls on %3*plus%*. For example:
.begin centerit;

%3
←plus[4;add1[2];4] %1expands to%* *plus[4;*plus[add1[2];4]] %1
which can be efficiently compiled. 
.end

.P73:
There is a closely related use of LISP macros which is worth mentioning.
Recall on {yon(P60)} we defined %3coef%* 
as %3car%*. Compiled calls on %3coef%* would invoke the function-calling
mechanism, whereas many compilers can substitute actual hardware instructions
for calls on %3car%*, resulting in more efficient run-time code.
So for efficiency's sake it would be better to write %3car%* instead of
%3coef%*. There are two objections to this. First, %3coef%* has more
mnemonic significance than %3car%*. Second, using %3car%* we have explicitly
tied our algorithm to the representation. Both are strong
objections.
Macros can help overcome both objections. Define:

←%3coef <%4m%*= λ[[l]cons[CAR;cdr[l]]]%1.

(Recall that the whole call %3(COEF ... )%* gets bound to %3l%*.)
So the user writes %3(COEF ...)%*; the evaluator sees %3(COEF ...)%* and
finally evaluates %3(CAR ...)%*; and the compiler sees %3(COEF ...)%*
and compiles code for %3(CAR ...)%*. So we can get the efficient code, the
readibility, and flexibility of representation with macros.

Since %3eval%* handles calls on special forms, we should examine the
extensions to %3compile%* to generate such code. We have seen that
in compiling  arguments to (normal) functions, we generate the code for
each, followed by code to save the result in the run-time stack, %3P%*.
The argument to a special form is %2unevaluated%*, by definition. All we
can thus do for a call of the form %3f[l]%*, where %3f%* is a special form,
is pass the argument, compiling something like:

.BEGIN CENTERIT;SELECT 3;GROUP;

←(MOVEI AC1 (QUOTE (l)))
←(CALL 1 (E F))
.END

This should also be clear from the structure of %3FEXPR%* calling in the %aSM%*
evaluator. 


←%2Problems%*

I. Extend the last %3compile%* function to handle macros.

II.  We have seen the (necessarily) inefficient code generated by a compiler
for %3FEXPR%* calls.  Assume ⊗→%3and%*↔← is a special form with an indefinite
number of arguments.  Show how to modify %3compile%* to recognize %3and%*
and compile efficient code for its execution.

.SS(Debugging in general)
Few areas of the Computer Science field are as primitive
as that of  debugging. Few areas of the field are as important. Getting a
correct program  indeed is the whole point of our programming.
The power of our debugging techniques  has been directly related to the
sophistication of the hardware/software interface which is available.
Not until the advent of sophisticated on-line systems has there really 
been any hope for practical "correct-program" construction.

We will begin  with  a  very  brief historical  description  of  systems
organization, from the bare machine to multi-processing.
	In  the  early  days of  computers,  operating  systems  were
non-existent.   You would  sign up for  a couple of  hours of machine
time, appear  with your  card deck  or paper  tape,  and operate  the
machine.  Debugging devices consisted of a sharp pointed stick, and a
quick  hand on the console  switches.  This  means of programming was
very satifying to  many of the  programmers, but machine  utilization
left something  to be desired.   Much of  the time the  machine would
sit idle (stopped) as  you would think  about what to  do next.   
Debugging and programming were both done at the machine level.


As processing speed and  machine costs increased it became  evident that
this  mode  of operation  had to  go.   The  first  operating systems
consisted of a dull  slow object called  an operator and a  satellite
computer on which an input tape  consisting of many separate jobs was
created.   Each  job  on the  input tape  was sequentially  read into
memory, executed and the output presented to an output tape.  If some
abnormal  behavior  was  detected  in  your program,  you  were  also
presented with an  often uninspiring  octal dump of  the contents  of
memory. Memory dumps were an appalling debugging technique, even then.
Programming was frequently done in a higher level language so the contents
of most machine locations had little meaning to the programmer. Also the
state of the machine at the time the dump was taken frequently had only 
a casual relationship with the actual bug.


In the late '50s several people ⊗↓Including J. Mc Carthy.←begain advocating
time-sharing, or interactive systems, whcih would allow the programmer
to "play with" the machine as if he were the sole user, but when he was
%2thinking%* about his program, the machine could be servicing some
other programmers needs ⊗↓The hardware and software necessary to support such
behavior is quite interesting, but not relevant to our current discussion.←.
What makes  time-sharing viable  is the tremendous  difference
between human reaction time and machine speeds.  In a period of a few
seconds a well  designed   system can satisfy  simple requests  by
many users.

terminals
.SS(Debugging in LISP)

When can the  %3CALL%* instruction be  replaced by a %3PUSHJ%*?   The
%3PUSHJ%*  instruction  is  used to  call  a  machine language  function.
Obviously  if  we  are  calling  an  interpreted  function, (it  has
indicator %3EXPR%*  of %3FEXPR%*) %3PUSHJ%*  is the wrong thing  to do.   In this
case  we must pass  control to  %3eval%*, evaluate the  function with the
appropriate arguments,   return the value  in %3AC1%* and finally  return
control to the instruction following the %3CALL%*.  If the function being
called is a  machine language  routine (indicator is  %3SUBR%* or  %3FSUBR%*)
then  we  could  replace   the  %3CALL%*  with  a  %3PUSHJ%*.  This  would
`short-circuit'  the call on  the submonitor  and the calling  of the
function could be  done more quickly.   There  are many occasions  in
which we do not wish to make this replacement, however.

Assume for  the  moment that  I am  mortal and  that my  LISP
program  has some  bugs.   Crucial  pieces of  information  about the
behavior of the program are: which functions are being  entered, what
are  the actual  parameters, and what are the  values being returned.
Assume that  we wish to monitor the behavior of the function, %3FOO%*. We
will hide the  real definition of %3FOO%*  on another symbol table  entry
(using %3gensym[]%*)  and redefine %3FOO%* such  that, when it  is called, it
will:

.BEGIN narrow 10;;

%21.%*  print the values of the current actual parameters.

%22.%*	use %3apply%* to call the real defintion of %3FOO%* with the current parameters.

%23.%*	print the value of the call on %3FOO%*.

%24.%*	return control to the calling program.
.END
This device is called tracing.

Since  %3FOO%*  may  be  recursive  we   should  also  give  some
indication of the  depth of recursion being executed.  Now every call
on %3FOO%* will give us  the pertinent statistics.  Interpreted calls  on
%3FOO%* will  go through %3eval%*, and  if %3(CALL ... FOO)%* is being  used in the
compiled  code  the  submonitor  can  pass  control  to  the  tracing
mechanism; but if the %3CALL%* is replaced by a %3PUSHJ%*, the control passes
directly to the machine language code for %3FOO%* and we cannot intercept
the call.

On most implementations  of LISP the %3PUSHJ-CALL%*  mechanism is
under  the  control   of  the  programmer.    After  the  program  is
sufficiently debugged,  replace  the  %3CALL%*  with the  %3PUSHJ%*  and  the
programs will  execute faster.   But  be warned  that this  action is
irreversible on most machines; once the %3CALL%*s have been overwritten it's tough beans!!

A variant of this tracing scheme can  be used to monitor %3SET%*s
and  %3SETQ%*s in  interpreted %3prog%*s.   Since calls  on %3SET%* and  %3SETQ%* are
interpreted (by %3eval%*  and Co.),  we can modify  their definitions  to
print the  name of the  variable and the  new assignment, do  it, and
return.  (question: can you see why this perhaps won't (or shouldn't)
work for compiled code?)

This is a very brief description of debugging in LISP.  It again shows
some of the power resultant from having an evaluator  available at runtime.
Much more sophisticated debugging techniques
can  be implemented,  particularly  in an  on-line  implementation of
LISP. Perhaps the most comprehensive on-line LISP system is INTERLISP, an
outgrowth of BBN LISP. The details of this undertaking would take us too far
afield from our current discussion.
.SEC(Storage structures and efficiency)

Though it is true that any algorithm can be coded in terms of
manipulations of binary trees, there are many instances in which more
efficient organizations of data exist.

At the most elementary  level are vectors and arrays.   These
data structures could  certainly be stored in a list structure format
and individual components accessed via %3car-cdr%* chains.  However, most
machines  have a  hardware  organization which  can  be exploited  to
increase   accessing   efficiency  over   the   list  representation.
Similarly, strings can  be represented as  lists of characters.   The
string processing operations are expressible as LISP algorithms.  But
again this is usually not the most resonable representation. Even  at
the level of list-structure operations, simple binary trees might not
be the most  expeditious representation for every problem.  Also many
of the algorithms we  have presented in  LISP are overly wasteful  of
computation time.  This section of notes will begin an examination of
alternatives   to  LISP  organization.     There  is   no  best  data
representation, or no best algorithm.  The representations you choose
must match your machine and  the problem domain you are studying.  If
your application is strictly numerical then list-structure is not the
answer; if you wish to manipulate simple linear strings of characters
then list processing is too general.  And there are many applications
of list processing which are sufficiently well-behaved that you don't
need a storage  allocation device as complex as  a garbage collector.
The point  is that if you have seen the list-processing techniques in
a rich environment  such as  LISP and have  seen the applications  to
which  LISP may  be put,  then you  will be  prepared to  apply these
techniques  in  a  meaningful  way.    Many  times  an  (inefficient)
representation in  LISP  is all  that is  needed.   You  get a  clean
representation  with comprehensible algorithms.   Once you've studied
the algorithms, efficiencies might come to mind. At that  time either
recode the problem using some  of the obscene LISP programming tricks
or drop into machine language if very fine control is required.  Once
you have  %2a%*  representation it  is  easy to  get  better ones.    For
example,  once you have  a compiler  which is   correct  it is
easier to describe a smarter one.
		
This section will describe other representations than LISP binary trees
and will show some ways of increasing the efficiency of LISP programs

.SS(Bit-tables)
Bit tables: In the marking phase of a garbage collector it is
necessary to  record the marking of each word.   On many machines the
marking of a word is signified by setting a bit in a bit table or bit
map.  For example:




.GROUP
.GROUP SKIP 8;
←%2Bit map for garbage collector%*
.apart




This  might be  done for  several  reasons.   The  natural choice  of
setting  a mark-  bit  in the  actual word  being  marked may  not be
possible or not the best strategy: 
.BEGIN INDENT 0,6;
1.  for words in  FS, there is  no
room if each  word contains exactly two addresses; or  

2. the word is
in  FWS and  the  meaning of  the information  stored there  would be
changed;  

3. also  the  garbage  collector must  initialize  all  the
mark-bits to  zero before the actual marking  process begins (look at
the g.c. algorithm).  It is faster  on most machines to zero a  whole
table rather than zero single bits in  separate words; and finally 

4. in  garbage collectors for more  complicated data structures, marking
with a bit table becomes a necessity.

.END
.SS(Vectors and arrays)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
%2Vectors%*:  Vectors (or  one  dimensional arrays)  are  usually
stored sequentially in memory.  Simple vectors are usually stored one
element to a  memory location though  this is not  a necessity.   For
example a complex vector is very  naturally stored as pairs of cells;
or if  perhaps you would allow vectors of non-homogeneous data modes,
each element would  have to include type  information.  In any  case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.  There is a natural
simulation of  a stack as  a (sequential) vector  with access  to the
stack made via a global pointer to the vector.

%2Arrays%*: Arrays are multi-dimensional data  structures.  Since
most  machine memories are linear  devices we must map  arrays onto a
linear representation.   We will  restrict attention fo  the case  of
two-dimensional  arrays.   Most  of the  discussion generalizes  very
naturally.   A very common device is to store the array by rows; that
is,  each  row is stored sequentially,  first, row 1; then  row 2,...
Given this representation there  is a trivial calculation to find the
location of an arbitrary element, A[i;j], if we know the location  of
the first  element, A[1;1] and  the extent of  the dimensions  of the
array.  For an array A[1:M; 1:N]

←loc[A[i;j]] = loc [a[1;1]] + (i-1)*N + (j-1)

In languages like Fortran which require that the size of the array be
known at compile-time  the compiler can generate the accessing code
without problem.  Languages, like Algol 60, and some versions of LISP
which allow the size of an  array to be determined at runtime require
a bit more care.  Algol 60, for example requires that you declare the
type (real, boolean,  etc.) of  the array and  specify the number  of
dimensions  in the  array,  but you  can postpone  until  runtime the
designation of the size of each dimension.  To handle this complexity
a dope vector is  introduced. The compiler can determine  the size of
the dope vector, but  not the contents.  The dope vector is filled in
at runtime and  contains information about the  actual extent of  the
array bounds.   Also since  the size of the  array is not  known, the
compiler  cannot   allocate  space  for  the  array  elements.    The
allocation must be  done at runtime.   When the array declaration  is
executed we must  know all the information about the  array.  At that
time we add the  array-bound information to the  dope vector and  add
information telling where  to find the  array elements.   Assume that
the  array elements are stored  by rows.  Look  at the calculation of
the location of element, A[i;j].   For specific execution ofan  array
declaration much  of this information  is constatnt; the  location of
the array  elements, in particular, A[1;1] and the number of columns,
N, is fixed.  Thus we rewrite the calculation as:

\constant part\variable part

\ [loc [A[1;1]]-N-1]       +\  	(i*N+j)

The constant  part is stored  in the  dope vector.   When we wish  to
reference  an element A[i;j] we  need only compute  the variable part
and add it to the constant part.

The dope vector for A [1:M; 1:N] perhaps might contain
.GROUP SKIP 10;




There is another scheme  for storing arrays which is  used in
some of  the Burroughs machines. Each row  is stored sequentially and
access  to  separate  rows  is   made  through  a  device  called   a
`mother-vector'.   The mother vector is  a vector of pointers  to the
rows.  Thus:




.GROUP SKIP 10;





Notice that the accessing computation is very  cheap.  Another effect
is  that all rows  need not  be in memory  at once.   On a  paging or
segmenting machine (we  will talk  about machine organization  later)
this array organization can be helpful.  If an access to a row not in
core  is made, a `page  fault' is raised; the  monitor brings the row
into memory and the computation continues.   The mother-vector scheme
generalizes  nicely  to   multidimensionality  and  can  be  used  in
conjunction with a dope vector.

.END
A typical implementation on an array facility in LISP would include
a declaration:

.BEGIN INDENT 0,10;
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... <bounds>], where the
identifier names the array; the type could be numeric or sexpr; and  finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus:
.END
.GROUP SKIP 15;
.BEGIN INDENT 10,10;
If we are to store sexprs in the array then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.

When an array element is to be referenced, then the subscripts are evaluated
(recall that the array name was declared as a %3SUBR%*) and the dope vector code
is executed, resulting in a reference to the appropriate cell.
.END

We also must be able to store information in the array.
.BEGIN INDENT 0,10;
%3store[%1<name>[<subscr>; ... <subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.END
.SS(strings and linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
Strings and string processors are an important class
of data  structures and algorithms.  Powerful  string processing is a
necessity  for any  well  developed  compiler-writing  system.    The
organization  and  implementation  of   a  general  string  processor
directly parallels LISP.  In fact a subset of LISP,
⊗→linear LISP↔←,  is a nice notation for string algorithms.

A string is a sequence of characters.   A reasonable language
(not PL/1) for string processing should allow the creation of strings
of arbitrary length at runtime; it should allow the generation of new
strings and the  decomposition of existing  strings.  If  strings of
arbitrary length are  to be created, an organization similar to FS in
LISP can be used with, perhaps, a string garbage collector.   We will
assume this most general case.
The machine memory will  contain at least a ⊗→string  space↔←, an
evaluator, a symbol table, and a garbage collector.

String space is a linear sequence of cells, each of which can
contain exactly  one charcter.   A string  will be  represented as  a
sequence of sequential character cells.  A string variable will be an
entry in  the symbol table; the current value of the variable will be
represented as a pair; character count and a pointer to the beginning
of the character sequence.

Thus:

.GROUP SKIP 4;

.BEGIN TURN OFF"←";
We must make some decisions about how  we manipulate strings: when we
perform %3x ← y%*, do we copy the symbol table pair of %3y%* into that of %3x%*, or
do we make a new string isomorphic to %3y%* and point %3x%* at it.   It makes
a  difference.    We  will   choose  the  former,  copying  only  the
`descriptor',  that  is,  we  will  share  strings  (and  substrings)
wherever possible. This decision makes the  storage requirements less
expensive, but will make our  life more difficult when we worry about
⊗→garbage collection↔←.   There are  three  primitive functions: ⊗→%3first%*↔←,
⊗→%3rest%*↔←,  ⊗→%3concat%*↔←  (read: %3car%*,  %3cdr%*,   %*cons%*).
.END

.BEGIN INDENT 0,10;
%3first[x]%*  is  the first
character of the string represented by %3x%*.  %3first%* is undefined for the
empty string. For example:
.END

←%3first[ABC]%1 is %3A; first[%1∧%3]%* is undefined.

%3rest[x]%* is  the string  of characters  which remains  when the  first
character of  the string is deleted.  %3rest%* is also undefined for the
empty string. For example:

←%3rest[ABC]%* is %3BC%*

.BEGIN INDENT 0,10;
%3concat[x;y]%* is a function of two arguments.   %3x%* is a character; %3y%* is
a  string. %3concat%* forms  a string  consisting of the  concatenation a
copy of %3x%* and a copy of the string, %3y%*.  For example:
.END

←%3concat[A;BC]%* is %3ABC%*
	
There are three string predicates:

.BEGIN CENTERIT;
←⊗→%3char%*↔←%3[x]%1:  is %3x%* a single character?
←⊗→%3null%*↔←%3[x]%1:  is %3x%* the empty string?
←%3x ⊗→%3=%*↔← y%1:  are %3x%* and %3y%* the same character?

For example:

←%3char[A] %1is true
←%3char[AB] %1is false
←%3AB = AB %1is undefined
.END

	Now to implementations:
%3first%* generates a character  count of 1 and a  pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.

%3concat%* is a  bit more  problematic.   We copy %3x%*  and copy  %3y%*,
generate  a character  count of  the sum  of those  of %3x%*  and %3y%*,  and
generate a pointer to the character of the copy of %3x%*.  The copies are
made in the free string space pointed to by the %2⊗→string space pointer↔←%*.
The obvious  question of  what to do  when string space  is exhausted
will be postponed  for a moment.   The implementations  of the  three
predicates are easy: we will blur  the distinction between characters
and strings  of length 1.  Thus %3char%*  need check the character count.
%3null%* says character count is 0.  What about = ?  Obviously characters
are  not  stored   uniquely,  so  we must  make  an  actual character
comparison.

Now  ⊗→garbage collection↔←.    In  some  ways a  string  garbage
collector is simpler and in some ways more difficult than a collector
of list-structure.   The  marking phase  is much  simpler: using  the
descriptor in the symbol table, mark  the character string.  Since we
are sharing  substrings, we cannot stop the marking simply because we
have encountered a previously  marked character. 
But at least  the marking is not recursive.   However, the collection
phase needs to be  more sophisticated for  string processors.   Since
strings are  stored linearly (or  sequentially), a  fragmented string
space  list  is  of  little  use.    Thus  we  must compact  all  the
referenceable strings into one end of string space, freeing  a linear
block for the new free string space.  Since we are sharing substrings
a  little care  needs  to  be exercised.    When  we move  a  string,
obviously the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location.  So before
we begin the relocation of strings  we sort the string descriptors on
the  basis of their  pointers into  string space.   We then recognize
each parent string, moving it down into freed  locations and updating
the address  pointers of any  substrings.  We  continue this process.
Eventually all strings will  be compacted down  in string space.  The
free space  pointer will  be set  and the  computation can  continue.
Compacting garbage collectors  can be adapted for use in LISP or more
general types of data structures.

.END
.SS(%3rplaca%* and %3rplacd%*)

.BEGIN TURN ON "←";
We  will  first look  at  some LISP  coding
tricks which when used judiciously, can increase efficiency, but when
used in a cavalier manner can result in mystery.  First, LISP does an
awful lot of copying.  Consider:

%3←append <= λ[[x;y][null[x] → y;T → cons[car[x];append[cdr[x];y]]]]%*

This function copies %3x%* onto the front of %3y%*. Note: %3y%* is not copied.
Or recall the %3subst%* function: it generates  a copy with the correct
substitutions made.   The copying is necessary  in general. It  keeps
unsavory side effects from happening.

Let's look at %3append[(A B C);(D E F)]%*.  It appears that we
could get the appropriate effect of %3append%* by %3cdr%*-ing down the list
%3(A B C)%* until we found the terminator, then replace it with a pointer
to the list %3(D E F)%*.  Thus:


.GROUP SKIP 6;

What's wrong here?  Consider the sequence of statements:

.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;

\i ← (A,B,C)

\j ← (D,E,F)

\k ← append[i;j]
.END

Then if %3append%* worked  as advertised above, (changing the  %3cdr%* of the
last element of %3i%*) the following evil would be perpetrated: the value
of  %3i%*  would be  changed  surreptitiously!!  %3i%*  now  has   the  value
%3(A,B,C,D,E,F)%*.   Language features which  do this  are evil.   It is,
however,  quite useful to be  evil sometimes.   Notice that any value
which was sharing  part of the structure  of %3i%* will also  be changed.
This can cause real mystery!! Well the world is good, and %3append%* does
not work as above.   The LISP  function which %2does%*  work this way  is
called %3nconc%*.   It  can be  defined in  terms of  a primitive  obscene
function, %3rplacd%*.  There are two primitive obscene functions:

⊗→%3rplaca%*↔←%3[x;y]%* replace the %3car%*-part of %3x%* with %3y%*.


.GROUP SKIP 6;




⊗→%3rplacd%*↔←%3[x;y]%* replace the %3cdr%*-part of %3x%* with %3y%*.



.GROUP SKIP 6;




Thus %3nconc%* can be defined as:
.BEGIN SELECT 3;TABIT1(20);TURN OFF"←";

nconc <= λ[[x;y]prog\[[z]
\ [null[x] → return[y]];
\ z ← x;
\a[null[cdr[z]] → rplacd[z;y];return [x]];
\ z ←cdr [z];
\ go[a] ]]

.END
These functions  must be  used with  extreme caution.   They are  not
recommended for amateur hacking.  They are introduced here simply to
show that  it  is  possible to  improve  on the  efficiency  of  pure
algorithms by resorting to these coding tricks.
Consider:

.BEGIN SELECT 3;TABIT1(30);TURN OFF"←";

\x ← (NOTHING CAN GO WRONG);
\rplacd[cdddr[x];cddr[x]];
\print[x];

.END

So we can use %3rplacd%* to generate circular lists (and to generate wall
paper  by printing  them!!).
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In  general,  to circularize  a non-empty
list, %3x%*, %3rplacd[last[x];x]%* suffices where:

←%3last <=λ[[x][null[cdr[x]] → x; T → last[cdr[x]]]]%*

←%2Problems%*

%21.%1  What is the effect of evaluating %3rplacd[x;cdr[x]]%*?
.END


.SS(Applications of %3rplaca%* and %3rplacd%*)
We begin with rather simple examples. Consider the  problem of inserting
an element into the middle of a list.  For example let %3x%* be the list,
%3(A B C)%*.  If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform:

.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]].
.END

Indeed, in general, we have little choice but to recopy the the initial segment
of %3x%*, adding %3D%* into the appropriate place.  A similar technique is
obviously available to delete a specified element from the interior of a list.

Careful use of %3rplacd%* can, in some instances, replace the heavy use of %3cons%*.
%3cons%* always carries with it the threat of a call on the garbage collector.

For example, given the list %3(A B C)%* with pointers, %3x%* and %3y%*, into it
as follows,

.GROUP SKIP 5;

we could insert the element, %3D%*, %2after%* the first element in %3y%* by:

←%3rplacd[y;cons[D;cdr[y]]%*, giving:

.GROUP SKIP 5;

(Notice that %2one%* %3cons%* is unavoidable.)

But as always, be warned that the value of %3x%* has also been changed; and
any sexpr sharing the list %3x%* or %3y%* as a sublist has also been affected.

We can also use %3rplacd%* to delete not the %2first%*, but the next element,
in %3y%* by:

←%3rplacd[y;cddr[y]].%*

Similarly, we can use %3rplaca%* to modify an element in a list (or sexpr).
To change the first element in the list, %3y%*, to the sexpr, %3z%* use

←%3rplaca[y;z]%*.

Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %2after%* and delete %2after%*, rather than insert %2at%* or delete %2at%*.
If you look at a diagram you will see why:

.GROUP SKIP 7;


To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell.  Similarly to insert at a specified cell. How could we write such modifying
functions?  A simple, perhaps inefficient scheme, would be to start another pointer
 from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.  

If these `modification-%2at%*' functions were to be performed very frequently
then it might be worth starting %2two%* pointers down the list, one at %3x%*,
one say %3y%* at %3cdr[x]%*, as above.  Then %2testing%* could be done using %3y%*
and the %2modification%* could be done using %3x%*. When we move %3y%* to
%3cdr[y]%* we move %3x%* to %3cdr[x]%*.  What happens if we wanted to modify
%2before%* rather that %2at%*? We could proliferate the `back pointers', but
perhaps if this kind of generality is required a change of representation
is called for. More complicated data representations are discussed
in detail in ⊗→Knuth's Kompendium↔←,Khapter 2.


The LISP system uses %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists use these functions. Here are
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.

.BEGIN INDENT 0,10;

%3putprop%* is a function of three arguments, %3n%*, an atom; %3i%*, an indicator;
and %3v%* a value. The effect of %3putprop%* is to attach the indicator-value pair
to %3n%*.  If the indicator is already present then we will simply change its value
to %3v%*; if the indicator is not present then we will add the indicator-value 
pair to the front of the p-list.  Since %3VALUE%*-cells have a slightly different
format (see {yon(P50)}), there is special glitch for adding or modifying them.
.END

.BEGIN SELECT 3;TABIT2(5,8);GROUP; TURN OFF"←";

putprop <= λ[[n;i;v] 
   prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → go[b]]
\\m ← cddr[m];
\\[m → go[a]];
\\[eq[i;VALUE] → rplacd[n;cons[VALUE;cons[cons[NIL;v];cdr[n]]]];return[v];
\\ T → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]  ]
\b\[eq[i;VALUE] → rplacd[cadr[m];v];return[v];
\\ T → rplaca[cdr[m];v];return[v]] ]]

.END
Notes:

1. %3[m → go[a]]%* is a trick you have seen before.

2. Extended conditional expressions are used here.

%3remprop%* is simpler.

.BEGIN INDENT 0,10;
%3remprop%* is a predicate, whose importance lies in its side effects. %3remprop%*
has two arguments, %3n%*, an atom, and %3i%*, an indicator. If the indicator is found on the
p-list of the atom, then %3remprop%* removes the indicator-value pair and returns
%3T%*, otherwise %3remprop%* does nothing and returns %3NIL%*.
.END

.BEGIN SELECT 3;TABIT2(5,8);GROUP;TURN OFF"←";

remprop <= λ[[n;i]
   prog[[m]
\\m ← n;
\a\[eq[cadr[m;i] → rplacd[m;cddr[m]];return[T]]
\\m ← cddr[m];
\\[m → go[a]]
\\return[NIL] ]]

.END

Applications of %3rplacd%* will occur inside ⊗→%3ratom%*↔← and
inside the ⊗→garbage collector↔←.  For example the ⊗→sweep phase↔← might
be  described as:

.BEGIN SELECT 3; GROUP;TURN OFF"←"; TABIT2(22,25);

sweep <= λ[[x;y]prog\[[z]
\\z ← NIL;
\a\[nap[x] → z ← rplacd[x;z]]
\\x ← down[x];
\\[eq[x;y] → return[z]]
\\go[a] ]]

.END

Where %3sweep%* is to sweep from %3x%* to %3y%*.  %3nap%* is a predicate
returning %3T%* just in the case that its argument is still marked `not available';
and %3down%* finds the `successor' of its argument in the linear ordering of the
memory cells.

As a final example we will describe a ⊗→fixup↔← mechanism (see {yonss(P7)}) using 
%3rplacd%*. Since our fixups will be  very simple when assembling compiled code, 
we will chain the forward references together via their %3cdr%*-parts. If an atom,
%3n%*, is defined to be a label it will have on its p-list, the indicator
%3SYM%* and a value representing the memory location assigned to %3n%*.
In the case of forward references to a label %3n%* two actions may occur.
Let loc be the memory location to be assigned to the instruction containing the
forward reference.
If this is the first occurrence of a forward reference to %3n%*, 
then the pair %3(UNDEF .(%* loc%3))%1 is added to the property list, and %30%*
is placed in the %3cdr%*-part of loc and the partial instruction is placed
in the %3car%*-part. Thus:

.GROUP SKIP 5;

Any succeeding references to %3n%*  result in changing the  %3cdr%* of the
p-list pair, call it v, to point to 
a cell whose %3car%*-part is  loc and whose %3cdr%*-part is v.  As above 
we deposit the partial instruction in the %3car%*-part and %30%* in the %3cdr%* part
of loc.  Thus the next reference would update our example to:

.GROUP SKIP 10;

When the label %3n%* finally is defined we must perform the fixups, 
delete the %3UNDEF%* pair, and add a pair %3(SYM%* . loc %3)%* on the p-list.

.GROUP SKIP 10;

Here are the functions. %3defloc%* is called when a label has been defined; 
 %3gval%* is called when a label is referenced.

.BEGIN SELECT 3;GROUP;TABIT2(25,32);TURN OFF "←";
defloc <= λ[[lab;loc]prog\[[z]
\\[z ← get[lab;UNDEF] → go[fixup]
\a\return[putprop[lab;loc;SYM]]
\fixup\[null[z] → rplacd[lab;cdddr[lab]];go[a]]
\\deposit[car[z];plus[examine[car[z]];loc]]
\\z ← cdr[z];
\\go[fixup] ]]
.END

.BEGIN SELECT 3;GROUP;TABIT1(15);
gval <= λ[[lab]\[get[lab;SYM];
\ T → putprop[lab;cons[loc;get[sym;UNDEF]];UNDEF]0]]
.END

Notes: %3gval%* is full of pitfalls.

%21%*.  %3loc%* is a global variable.

%22%*.  There is no e%41%*; we assume that if p%41%* evaluates to something not
equal to %3NIL%*, then that value is the value of the conditional expression.

%23%*.  Otherwise  the %3putprop%* is executed and %30%* is returned.

See problem III, {yon(P10)}, and problem *** at the end of this section for 
the description of a complete assembler for our latest %3compile%*.

←%2Problems%*

assembler

II. More on %3ratom%*.

Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.

III. Recall the function which reverses the top-level elements of a list.
On {yon(P16)} and {yon(P23)}
we wrote it in various styles and with varying degrees of efficiency.
All of these functions used %3cons%*; however it is clear that we should be
able to reverse a list without using %2any%* new cells.
Write an AMBIT algorithm  for such a reversing function. Express this
algorithm as a LISP function. If you use %3prog%* don't use any %3prog%*-variables.

.SS(Numbers,numbers)

In most implementations of LISP, numbers are stored  as very simple kinds of
atoms:  they do not  need print names,  but since  we should probably
allow fixed and floating point representation, we do  need indicators
for these properties.  Thus:

.BEGIN TABIT2(10,45);GROUP;TURN OFF "α";
\fixed-point 1\floating-point 1.0
\|\|
\|--→%7α~~]%1--→%7[%bFIXNUM%7]~~]%1--→%7[~%11%*~]\%1|--→%7α~~]%1--→%7[%bFLONUM%7]~~]%1--→%7[%1201400000000%7]%1

.END
Notice that each actual number is stored in FWS.  This representation
is a  bit expensive.  On  some machines we can use  a coding trick to
improve representation of some  numbers.  Assume that the  addressing
space of the machine  is 2%818%* and that the usual  size of a LISP
core image is significantly smaller, say, N.  Then all memory address
references greater than N  are illegal (and trapped by  the monitor).
What we  will do is use  these illegal addresses to encode  some of the
smaller positive and  negative integers, mapping  zero on the  middle
address, the  positive numbers to  lower addresses and  the negatives
onto  the  higher   addresses.    Thus  these  smaller  integers  are
represented by pointers outside of the normal LISP  addressing space.
This  trick can  considerably decrease  the storage  requirements for
jobs  heavily using small  numbers.  (Numbers are  not usually stored
uniquely).




.group
.GROUP SKIP 20
%2←Picture of INUM Space%*
.apart



Most  numerically oriented  programs  are  faced  at some  time  with
overflow.  That is, they attempt  to construct a number  which is too
large to  be represented  in one  machine location.    There are  now
several   versions   of   LISP  which   will   automatically   change
representation  when faced  with overflow.   This  new representation
called Bignums, could have the following structure:


.group
.group skip 7;
←%2Structure of a BIGNUM%*
.apart



The value of Bignum is given by:

 
where α-1 is  the largest number  representable in one  machine word.
The intertranslations  between Inums, Fixnums or Flonums, and Bignums
is done automatically by %3eval%* and Co.
.SS(Stacks)
.<<implementation of stacks and cg...>>

***moved it   fix up   OR THROW OUT!!!***

****some of this should go into stoage structure***


Notes: The right-side arrows on ##%7α~~~]%*## and ##%7[~~~]~~~]%*##
are used by the garbage collector.  First we sweep from
##%7[~~~~~~]%4TOP%1# to##%7[~~~~~~~~~]%4BOTTOM%1#,##
setting the side arrows to  %7G%4NA%1#;

when we mark, we reset those nodes which are reachable to point to  %7G%4A%1.
Then the final sweep phase will collect all those nodes still marked,
%7G%4NA%1## into a new free space list, pointed to by   FSP .
%7α[~~~~]%4<name>%1## is the atom header;  <name> need not be present.

%7A%4MKR%1## is a pointer used during the garbage collection.

%7A%4P%1## is a pointer used to save the cdr parts of lists during 
the marking phase of the garbage collector.

%7[~~~~]%4TOP%1,## %7[~~~~~~]%4BOTTOM%1##  are used to delimit the boundaries of
free space.

The ⊗→garbage collector↔← has been slightly simplified.  We should mark
more than just  ⊗→OBLIST↔← (the symbol table); for example, what
%7A%4AC1%1 and %7A%4AC2%1 point to should be marked.  There are other 
intermediate results which must be preserved; these will become 
apparent as we proceed.

We must also be careful about the order in which the dotted lines are
`executed'; often we will number the arrows to indicate the order of
execution.

Following this introduction is the main structure of the first LISP garbage 
collector.  The heart of the collector is the marking algorithm.
Some care need be exercised here. Since we are marking binary list
structure in a sequential manner, we need to make a decision as to
whether to mark the ⊗→%3car%*↔←-part first or mark the ⊗→%3cdr%*↔←-part first.
Actually the order in which we do these operations isn't important.
What %2is%* important is that while we are marking the first-chosen
branch, we must remember where the other branch is so that we
can subsequently mark it.  Also since this marking process is 
obviously recursive -- branches may also have branches-- we
must be prepared to remember an arbitrary number of `other branches'.
The pointer, P, is used to help remember these pending branches.

As you chase the arrows in the AMBIT/G marking algorithm, notice
that:

%21.%*  We always mark the %3car%*-branch first. This is usually called
pre-order traversal.

%22.%*  As we prepare to examine the %3car%*-part of a tree we save a pointer
to that node of the tree, using P.

%33.%*  After we have marked the %3car%*-part, we use the information saved by 
P to traverse to %3cdr%*-part.


P is pointing to a sequence of cells linked together by their 
`down arrows'.  When we go down the %3car%*-part, we save the parent-node
in the cell currently being pointed to by P.  Then we set P to it's
`down arrow' successor:
.BEGIN GROUP;centerit;
.GROUP SKIP 10;
←%2Push

.END

As long as the %3car%*-part of the structure we are marking points
into free space, we will continue to update P.  If you look back
through the elements which we have added to P you will see a
history of those nodes whose %3car%*-parts have been marked, but whose
%3cdr%*-parts are still to be examined.  When we finally terminate the 
%3car%*-marking we will pick off the last node in P, decrement P, and
then traverse that substructure:

.BEGIN GROUP;centerit;
.GROUP SKIP 10;
←%2Pop

.END
Thus, P and its associated cells are behaving as a stack.

Recall that we are storing information as list structure and thus
have intersecting branches.
Now since we do have intersections, the marking process can be a little 
 clever.  As soon as the marker comes across a previously marked
cell we know that everything below that point in the structure has already
been marked.  We can then terminate the marker on that branch.

Here then, is the simple AMBIT/G garbage collector for LISP followed by
a partial description in LISP.

.CENT(A simple LISP garbage collector in AMBIT/G)

.NEXT PAGE